CID problem
I installed PDFMiner with pip to extract text from a Japanese PDF and the following problem occurred.
I(cid:888), Intellectual Production Techniques(cid:887)Good(cid:845)Reference(cid:853)Greed(cid:864)(cid:845)(cid:880)(cid:866). People(cid:884)intellectual production techniques(cid:923)teaching(cid:849)(cid:916).
-----
Notes on the research process
2014
Pointing out that CMap needs to be reworked. 2015
Talking about [ToUnicode map
Talking about ToUnicode map
Can embedded fonts be extracted? Discussion
Reports of two-point stools and others being replaced by CID
2018
Example of importing and using from within a script rather than from the command line
This one, like my environment, has hiragana as CID.
---
This page is auto-translated from /nishio/CID問題. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.